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We give an algorithm for the hidden subgroup problem for the dihedral group Dj^, or equivalently the cyclic 
hidden shift problem, that supersedes our first algorithm and is suggested by Regev's algorithm. It runs in 
exp(0(yTog7V)) quantum time and uses exp(0(vTog]V)) classical space, but only O(logA') quantum space. 
The algorithm also runs faster with quantumly addressable classical space than with fully classical space. In the 
hidden shift form, which is more natural for this algorithm regardless, it can also make use of multiple hidden 
shifts. It can also be extended with two parameters that trade classical space and classical time for quantum time. 
At the extreme space-saving end, the algorithm becomes Regev's algorithm. At the other end, if the algorithm 
is allowed classical memory with quantum random access, then many trade-offs between classical and quantum 
time are possible. 



1. INTRODUCTION 

In a previous article fT\, we established a subexponential- 
time algorithm for the dihedral hidden subgroup problem, 
which is equivalent to the abelian hidden shift problem. That 
algorithm requires exp((9(-yiog A^)) time, queries, and quan- 
tum space to find the hidden shift s in the equation g{x) = 
f{x + s), where / and g are two injective functions on Z/N. In 
this article we present an improved algorithm. Algorithm |4.4| 
which is much less expensive in space, as well as faster in a 
heuristic model. Our algorithm was inspired by and general- 
izes Regev's algorithm [lOJ. It uses exp((9(-\/log A^)) classi- 
cal space, but only O(logA) quantum space. We heuristically 

estimate a total computation time of (9(2V^'°°2'^) for the new 

algorithm; the old algorithm takes time d{3^^^°^^^). 

The algorithm also has two principal adjustable parameters. 
One parameter allows the algorithm to use less space and more 
quantum time. A second parameter allows the algorithm to 
use more classical space and classical time and less quantum 
time, if the classical space has quantum access |5|. (See also 
Section |2]) Finally, the new algorithm can take some advan- 
tage of multiple hidden shifts; somewhat anomalously, our old 
algorithm could not. 

The new algorithm can be called a collimation sieve. As 
in the original algorithm and Regev's algorithm, the weak 
Fourier measurement applied to a quantum query of the hid- 
ing function yields a qubit whose phases depend on the hidden 
shift s. The sieve makes larger qudits from the qubits which 
we call phase vectors. It then collimates the phases of the 
qudits with partial measurements, until a qubit is produced 
whose measurement reveals the parity of s. We also use a 
key idea from Regev's algorithm to save quantum space. The 
sieve is organized as a tree with 0{-\/\ogN) stages, and we 
can traverse the tree depth first rather than breadth first. The 
algorithm still uses a lot of classical space to describe the co- 
efficients of each phase vector when it lies in a large qudit. If 
the qudit has dimension £, then this is only C*(log£) quantum 
space, but the classical description of its phases requires 0{t) 
space. 
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The main discussion of the dihedral hidden subgroup prob- 
lem has been as an algorithm with a black-box hiding func- 
tion. Recently Childs, Jao, and Soukharev |4| found a classi- 
cal, white-box instance of the dihedral hidden subgroup prob- 
lem, or the abelian hidden shift problem. The instance is that 
an isogeny between isogenous, ordinary elliptic curves can be 
interpreted as a hidden shift on a certain abelian group. Thus, 
just as Shor's algorithm allows quantum computers to factor 
large numbers, an abelian hidden shift algorithm allows quan- 
tum computers to find isogenics between large elliptic curves. 
This is a new impetus to study algorithms for the dihedral hid- 
den shift problem. 

Before describing the algorithm, we review certain points 
of quantum complexity theory in general, and quantum algo- 
rithms for hidden structure problems. We adopt the general 
convention that if X is a finite set of orthonormal vectors in a 
Hilbert space (but not necessarily a basis), then 

is the constant pure state on X. Also if X is an abstract finite 
set, then C[X] is the Hilbert space in which X is an orthonor- 
mal basis. Also we use the notation 

[«] = {0,1,. ..,«-!}, 

so that C[[n]] becomes another way to write the vector space 
C". 
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2. QUANTUM TIME AND SPACE 

As with classical algorithms, the computation "time" of a 
quantum algorithm can mean more than one thing. One model 
of quantum computation is a quantum circuit that consists of 
unitary operators and measurements, or even general quantum 
operations, and is generated by a classical computer (It could 
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be adaptively generated using quantum measurements.) Then 
the circuit depth is one kind of quantum time, a type of parallel 
time. The circuit gate complexity is another kind of quantum 
time, a type of serial time. We can justify serial quantum time 
with the following equivalence with a RAM-type machine. 

Proposition 2.1. The gate complexity of a classically uniform 
family of quantum circuits is equivalent, up to a constant fac- 
tor, to the computation time of a RAM-type machine with a 
classical address register, a quantum data register, a classical 
tape, and a quantum tape. 



2.1 



We will discuss Proposition 2.1 more rigorously in Sec- 
From either the circuit viewpoint or the RAM ma- 



tion 

chine viewpoint, serial computation time is a reasonable cost 
model: in practice, gate operations are more expensive than 
simple memory multiplied by clock time. 

An interesting and potentially important variation of the 
random-access model is quantum random access memory, or 
QRAM |5|. In this model, there is an address register com- 
posed of qubits and a memory can be accessed in quantum 
superposition, whether or not the cells of the memory tape 
are classical. Of course, if the memory is classical, only read 
operations can be made in quantum superposition. A RAM 
quantum computer thus has four possible types of memory 
tapes: classical access classical memory (CRACM), quantum 
access classical memory (QRACM), classical access quantum 
memory (CRAQM), and quantum access quantum memory 
(QRAQM). 

Hypothetically, one could cost quantum access classical 
memory (QRACM) simply as quantum memory. But for 
all we know, quantum access classical memory (QRACM) 
and classical-access quantum memory (CRAQM) are non- 
comparable resources. We agree with the suggestion ||3J 
that quantum-access classical memory could be cheaper than 
quantum memory with either classical or quantum access. Af- 
ter all, such memory does not need to be preserved in quantum 
superposition. Our own suggestion for a QRACM architecture 
is to express classical data with a 2-dimensional grid of pix- 
els that rotate the polarization of light. (A liquid crystal dis- 
play has a layer that does exactly that.) When a photon passes 
through such a grid, its polarization qubit reads the pixel grid 
in superposition. Such an architecture seems easier to con- 
struct than an array of full qubits. 

A good example of an algorithm that uses QRACM is the 
Brassard-H0yer-Tapp algorithm for the 2-to-l collision prob- 
lem 1 3 1, as the authors themselves point out. Given a function 
f :X where X has elements, the algorithm generates 
N'^I^ values of / at random and then uses a Grover search 
over N^l^ values to find a collision; thus the time complexity 
is 0{N^^^). This is a large-memory algorithm, but the bulk of 
the memory only needs to be quantumly addressable classical 
memory. By contrast, Ambainis' algorithm f2] for the single 
collision problem uses true quantum memory. 

Proposition 2.2. In the RAM model, a quantum access mem- 
ory with N quantum or classical cells can be simulated with 
a classical linear access memory, with the same cells, with 
0{N) time overhead. 



2.1. Some rigor 

Here we give more precise definitions of quantum RAM 
machine models, and we argue Propositions |2 . 1 1 and |2 .2 1 We 
would like models that have no extraneous polynomial over- 
head, although they might have polylogarithmic overhead. On 
the other hand, it seems very difficult to regularize polyloga- 
rithmic overhead. In our opinion, different models of compu- 
tation that differ in polylogarithmic overhead could be equally 
good. Actually, at some level a physical computer has at 
most the computational strength of a 3-dimensional cellular 
automaton, where again, the total number of operations is as 
important as the total clock time. (Or even a 2-dimensional 
cellular automaton; a modern computer is approximately a 
2-dimensional computer chip.) Procedural programming lan- 
guages typically create a RAM machine environment, but usu- 
ally with polylogarithmic overhead that depends on various 
implementation details. 

A classical Turing machine M is a tuple (5,r, 5), where S is 
a finite set of states, F is a finite alphabet, and 5 is a transition 
map. The Turing machine has a tape which is linear in one 
direction with a sequence of symbols in F, which initially are 
all the blank symbol b €Y except for an input written in the 
alphabet E = F\ {/?}. The state set S includes an initial state, 
a "yes" final state, and a "no" final state. Finally the transition 
map 5 instructs the Turing machine to change state, write to 
the tape, and move along the tape by one unit. 

In one model of a RAM machine, it is a Turing machine M 
with two tapes, an address tape 7^ with the same rules as a 
usual linear tape; and a main work tape TV. The machine M 
(as instructed by 5) can now also read from or write to Tiy (I4 ), 
meaning the cell of the tape Tw at the address expressed in 
binary (or some other radix) on the tape J4. It is known 21 
|9l that a RAM machine in this form is polylog equivalent to 
a tree Turing machine, meaning a standard Turing machine 
whose tape is an infinite rooted binary tree. 

It is useful to consider an intermediate model in which the 
transition map 5 is probabilistic, i.e., a stochastic matrix rather 
than a function. (Or a substochastic matrix rather than a par- 
tial function.) Then the machine M arrives at either answer, 
or fails to halt, with a well-defined probability. This is a non- 
deterministic Turing machine, but it can still be called classi- 
cal computation, since it is based on classical probability. 

One workable model of a RAM quantum computer is all of 
the above, except with two work tapes Tc and Tq, and a regis- 
ter (a single ancillary cell) Rq. In this model, each cell of Tq 
has the Hilbert space C[F], and the cell Rq does as well. The 
machine M can apply a joint unitary operator (or a TPCP) to 
the state of Rq and the state of the cell of Tq at the classical 
address in 7)i. Or it can decide its next state in S by measuring 
the state in Rq. Or it can do some classical computation us- 
ing the classical tape Tc to decide what to do next. All of this 
can be arranged so that 5 is a classical stochastic map (which 
might depend on quantum measurements), Ta and Tc are clas- 
sical but randomized, and all of the quantum nondeterminism 
is only in the tape Tq and the register Rq. In some ways this 
model is more complicated than necessary, but it makes it easy 
to keep separate track of quantum and classical resources. Tc 
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is a CRACM and Tq is a CRAQM. 

Proposition |2.1 1 is routine in this more precise model. The 
machine can create a quantum circuit drawn from a uniform 
family using and Tq. Either afterwards or as it creates the 
circuit, it can implement it with unitary operations or quantum 
operations on Tq and Rq. Finally it can measure Rq to decide 
or help decide whether to accept or reject the input. At lin- 
ear time or above, it doesn't matter whether the input is first 
written onto Tc or Tq. 

The basic definition of quantum addressability is to assume 
that the address tape J4 is instead a quantum tape. For simplic- 
ity, we assume some abelian group structure on the alphabet Y. 
Then adding the value of 7c(7a) to Rq is a well-defined uni- 
tary operator on the joint Hilbert space of Ta and Rq; in fact 
it is a permutation operator This is our model of QRACM. 
Analogously, suppose that we choose a unitary operator Uqr 
that would act on the joint state of Tq{Ta) and Rq if Ta were 
classical. Then it yields a unitary operator Uqar on the joint 
state of Tq, Ta, and Rq that, in superposition, applies Uqr to 
Tq{Ta) and Rq. This is a vaHd model of QRAQM. 

To prove Proposition |Z2] we assume that Tc can no longer 
be addressed with Ta, and that instead the Turing machine 
has a position n on the tape 7^ that can be incremented or 
decremented. Then to emulate a quantum read of Tc{Ta), the 
machine can step through the tape Tc and add Tc{n) to Rq on 
the quantum condition that n matches Ta. This is easiest to 
do if the machine has an auxiliary classical tape that stores n 
itself. Even otherwise, the machine could space the data on 
Tc so that it only uses the even cells, and with logarithmic 
overhead drag the value of n itself on the odd cells. 

3. HIDE AND SEEK 
3.1. Hidden subgroups 

This section is strictly a review of ideas discussed in our 
earlier article |7|. 

In the usual hidden subgroup problem, G is a group, X is 
an unstructured set, and / : G — > X is a function that hides a 
subgroup H. This means that / factors through the coset space 
G/H (either left or right cosets), and the factor / : G /H — > X is 
injective. In a quantum algorithm to find the subgroup H, f is 
implemented by a unitary oracle U f that adds the output to an 
ancilla register. More precisely, the Hilbert space of the input 
register is the group algebra C[G] when G is finite (or some 
finite-dimensional approximation to it when G is infinite), the 
output register is C[X], and the formula for Uf is 

Uf\g,xo) = \g,.f{g)+X(i). 

All known subexponential algorithms for the hidden sub- 
group problems make no use of the output when the target set 
X is unstructured. (We do not know whether it is even possi- 
ble to make good use of the output with only subexponentially 
many queries.) The best description of what happens is that 
the algorithm discards the output and leave the input register 
in a mixed state p. However, it is commonly said that the al- 
gorithm measures the output. This is a strange description if 



the algorithm then makes no use of the measurement; its sole 
virtue is that it leaves the quantum state of the input register 
in a pure state | . The state | is randomly chosen from a 
distribution, which is the same as saying that the register is in 
a mixed state p . 

If the output of / is always discarded, then the algorithm 
works just as well if the output of / is a state in a 

Hilbert space . The injectivity condition is replaced by the 
orthogonality condition {\if{g)\\i/{h)) — when g and h lie in 
distinct cosets of H. In this case / would be implemented by 
a unitary 

with the condition that if xq = 0, then 

u,\o) = \w{g)). 

Or we can have the oracle, rather than the algorithm, discard 
the output. In this case, the oracle is a quantum operation (or 
quantum map) (og/// that measures the name of the coset gH 
of H, and only returns the input conditioned on this measure- 
ment. 

Suppose that the group G is finite. Then it is standard to 
supply the constant pure state |G) to the oracle Uf, and then 
discard the output. The resulting mixed state, 

Pgih = Sgih{\G){G\), 

is the uniform mixture of \gH) over all (say) left cosets gH of 
H. This step can also be relegated to the oracle, so that we can 
say that the oracle simply broadcasts copies of Pc/n with no 
input. 

Like our old algorithm, our new algorithm mainly makes 
use of the state Pg///, in the special case of the dihedral group 
G — Dfj. When = 2", it is convenient to work by induction 
on n, so that technically we use the state Pd^^/Hj, for 1 < k < 
n. However, this is not essential. The algorithm can work in 
various ways with identical copies of Pd^/h- 

An important point is that the state Pc/H is block diagonal 
with respect to the weak Fourier measurement on C[G]. More 
precisely, the group algebra C[G] has a Burnside decomposi- 
tion 

c[G]^0y*(»y, 

V 

where the direct sum is over irreducible representations of G 
and also the direct sum is orthogonal. The weak Fourier mea- 
surement is the measurement the name of V in this decompo- 
sition. Since Pg/h is block diagonal, if we have an efficient 
algorithm for the quantum Fourier transform on C[G], then 
we might as well measure the name of V and condition the 
state Pq/h to a state on V* (E)V, because the environment al- 
ready know^the name of V. Moreover, the state on the "row 
space" V* is known to be independent of the state on V and 



' In other words, Schrodinger's cat is out of the bag (or box). 
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carry no information about // fS^. So the algorithm is left with 
the name of V, and the conditional state Pv/n on V. The dif- 
ference in treatment between the value f{g), and the name of 
the representation V, both of which are classical data that have 
been revealed to the environment, is that the name of V is ma- 
terially useful to existing quantum algorithms in this situation. 
So it is better to say that the name of V is measured while the 
value f{g) is discarded. (In fact, the two measurements don't 
commute, so in a sense, they discredit each other.) 



3.2. Hidden shifts 

In our earlier work |7|, we pointed out that if A is an abelian 
group, then the hidden subgroup problem on the generalized 
dihedral group G=(Z/2)ixAis equivalent to the abelian hid- 
den shift problem. The hard case of a hidden subgroup on G 
consists of the identity and a hidden reflection. (By definition, 
a reflection is an element in G \ A, which is necessarily an ele- 
ment of order 2.) In this case, a single hiding function / on G 
is equivalent to two injective functions / and g on A that differ 
by a shift; 

f{a)=g{a + s). 

(Note that we allow an algorithm to evaluate them jointly in 
superposition.) Finding the hidden shift s is equivalent to find- 
ing the hidden reflection. 

In this article, we will consider multiple hidden shifts. By 
this we mean that we have a set of endomorphisms 

and a set of injective functions 

fjej-.A^X 

such that 

fjia)=Ma + <Pj{s)). 

Here J is an abstract finite indexing set with an element G 7. 
We assume that we know each 0^ explicitly (with 0o = 0) and 
that we would like to find the hidden shift s. In the cyclic case 
A = Z/N, we can write these relations as 

//■(fl) = .fQ{a + rjs) 

for some elements rj E Z/N. Note that, for s to be unique, 
the maps 0y or the factors rj must satisfy a non-degeneracy 
condition. Since we will only address multiple hidden shifts in 
the initial input heuristically, we will not say too much about 
non-degeneracy when |y| > 2. If |7| = 2 then ri or 0i must be 
invertible to make s unique, in which case we might as well 
assume that they are the identity. 

As a special case, we can look at the hidden subgroup 
problem in a semidirect product G = K t< A, where K is 
a finite group, not necessarily abelian. Our original algo- 
rithm was a sieve that combined irreducible representations of 
such a group G to make improved irreducible representations. 



Anomalously, the sieve did not work better when \K\ > 2 than 
in the dihedral case. The new algorithm can make some use 
of multiple hidden shifts, although the acceleration from this 
is not dramatic. 



The principles of Section 3.1 apply to the hidden shift or 
multiple hidden shift problem. For the following, assume that 
A is a finite group. We write 

f{J,a)^fj{a), 

and we can again make a unitary oracle Uf that evaluates / as 
follows: 

Uf\j,a,xo) = \j,a,f{j,a)+xo). 

Suppose also that we can't make any sense of the value of 
f{j,a), so we discard it. As in Section [3T[ the unitary oracle 
Uf is thus converted to a quantum map S" that makes a hidden 
measurement of the value of / and returns only the input reg- 
isters, i.e., a state in C[J] (g) C[A]. Suppose that we provide the 
map (o with a state of the form 

p^a(g>i\A){A\) (1) 

where a is some possibly mixed state on C[J]. As in Sec- 
tion 3.1 we claim that we might as well measure the Fourier 



mode e A of the state S'{p), because the environment al- 
ready knows what it is. To review, the dual abelian group A is 
by definition the set of group homomorphisms 

b:A^S^ CC, 

and the Fourier dual state \b) is defined as 



1^1 aeA 



We state the measurement claim more formally. 

Proposition 3.1. Let S he the partial trace ofU j given by dis- 
carding the output, and let the state p be as in ([T]l. Then the 
state S{p) is block diagonal with respect to the eigenspaces 
of the measurement of \b). Also, the measurement has a uni- 
formly random distribution. 

Proof. The key point is that p is an A-invariant state and S" is 
an A-invariant map, where A acts by translation on the C[A] 
register The state |A) is A-invariant by construction, while A 
has no action on the C[7] register Meanwhile S is A-invariant 
because it discards the output of /, and translation by A can 
be reproduced by permuting the values of /. Since p is an A- 
invariant state, and since the elements of A are unitary, this 
says exactly that p as an operator commutes with A. The 
eigenspaces of the action of A on C[7] ® C[A] are all of the 
form C[7] so the fact that p commutes with A is equiva- 
lent to the conclusion that p is block diagonal with respect to 
the eigenspaces of the measurement \b). 

To prove the second part, imagine that we also measure \f) 
on the register C [J] . This measurement commutes with both 
measuring the Fourier mode \b) and measuring or discarding 
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the output register C[X], so it changes nothing if we measure 
I j) first. So we know j, and since fj'.A^Xis injective, 
measuring its value is the complete measurement of \a) start- 
ing with the constant pure state \A). This yields the uniform 
state Punif on C[A], so the value of \b) is also uniformly dis- 
tributed. □ 

Suppose further that in making the state p, the state a on 
the C[J] register is the constant pure state \J). If the measured 
Fourier mode is b € A, then the state of the j register after 
measuring this mode is: 

lv)-I%W)l;>- (2) 

This can be written more explicitly in the cyclic case A = 
Z /N. In this case there is an isomorphism A = A, and we 
can write any element b EAas 

b{a) = exp{27Ziab/N), 
and we can also write 

(pjia) = rja 

for some elements rj e Z /N. So we can then write 

\xi/)ocY^exp{27tibrjs)\j). (3) 

At this point we know both b and each rj, although for differ- 
ent reasons: rj is prespecified by the question, while b was 
measured and is uniformly random. Nonetheless, we may 
combine these known values as bj = rjb and write: 

IV/) - ^exp(27riV)l7)- (4) 

To conclude, the standard approach of supplying the ora- 
cle U f with the constant pure state and discarding the output 
leads us to the state (|2]), or equivalently ([3| or Q. (Because 
measuring the Fourier mode does not sacrifice any quantum 
information.) In the rest of this article, we will assume a sup- 
ply of states of this type. 

4. THE ALGORITHM 

4.1. The initial and final stages 

For simplicity, we describe the hidden shift algorithm when 
A~Z/N and N ^2". The input to the algorithm is a supply of 
states Q. As explained in our previous work |7|, the problem 
for any A, even A infinite as long as it is finitely generated, 
can be reduced to the cyclic case with overhead exp{0{Vd)). 
Also for simplicity, we will just find the parity of the hidden 
shift s. Also as explained in our previous work |7 1, if we know 
the parity of s, then we can reduce to a hidden shift problem on 
Z /2"^ ^ and work by induction. Finally, just as in our previous 



algorithm, we seek a wishful special case of Q, namely the 
qubit state 

I V/) - |0) + exp(2;r/(2«- ' )s/2^)\ 1 ) = |0) + (- 1 )-^- 1 1 ) . (5) 

If we measure whether | y/) is | +) or | — ) , that tells us the parity 
of s. 

Actually, although we will give all of the details in base 2, 
we could just as well work in any fixed base, or let be any 
product of small numbers. This generalization seems impor- 
tant for precise optimization for all values of A^, which is an 
issue that we will only address briefly in the conclusion sec- 
tion. 



4.2. Combining phase vectors 

Like the old algorithm, the new algorithm combines un- 
favorable qubits states to make more favorable ones in 
stages, but we change what happens in each stage. The old al- 
gorithm was called a sieve, because it created favorable qubits 
from a large supply of unfavorable qubits, just as many classi- 
cal sieve algorithms create favorable objects from a large sup- 
ply of candidates 1 1 1. The new algorithm could also be called 
a sieve, but all selection is achieved with quantum measure- 
ment instead of a combination of measurement and matching. 
The process can be called collimation, by analogy with its 
meaning in optics: Making rays parallel. 

Consider a state of the form (|4]), where we write the coef- 
ficient bj instead as a function b{j), except that we make no 
assumption that bj = rjb for a constant b. We also assume that 
the index set is explicitly the integers from to £ — 1 for some 
I, the length of | y/) : 

7= [^] = {0,1,. ..,£-!}. 

We obtain: 

IV/)- ^ ^xp{27Zib{j)s/2")\i). 

o<j<e 

Call a vector of this type a phase vector. We view a phase 
vector as favorable if every difference b{ji) — b{j2) is divis- 
ible by many powers of 2, and we will produce new phase 
vectors from old ones that are more favorable. In other words, 
we will collimate the phases. The algorithm coUimates phase 
vectors until finally it produces a state of the form (|5]l. Note 
that the state | y/) only changes by a global phase if we add a 
constant to the function b. (Or we can say that as a quantum 
state, it does not change at all.) If 2'^\b{ji) — b{j2) for some 
m < n, then we can both subtract a constant from b and divide 
the numerator and denominator of b{j)/2" by 2"\ So we can 
|V/> as 

IV/)- ^ exp{2nibij)s/2')\j), 

o<j<e 

where h = m — n is the height of \\)f). (We do not necessarily 
assign the smallest height h to a given | \//) .) We would like 



6 



to collimate phase vectors to produce one with length 2 and 
height 1 (but not height 0). 

Given two phase vectors of height h, 

o<Ji<ei 

IV/2)- E cxp{2nib2U2)s/2'')\j2), 

their joint state is a double-indexed phase vector that also has 
height h: 

Iri,r2> = Wl)®W2) 

oc £ ^M2m{bi{jx)+b2{j2))s/2')\hj2). 

0<ii<£, 

0<h<h 

We can now collimate this phase vector by measuring 

c = bi{j,)+b2U2) (mod 2'") 

for some m < h. Let Pc be the corresponding measurement 
projection. The result is another phase vector 

but one with a messy indexing set: 

^ = {(;i,j2)|/'iO-i)+^202) (mod 2"')}. 

We can compute the index set J, in fact entirely classically, 
because we know c. We can compute the phase multiplier 
function b as the sum of b\ and b2- Finally, we would like 
to reindex using some bijection n : J ^ [^new], where 
^new = l-^l- As we renumber J, we also permute the phase 
vector Pc I y/i , 1/2) • Then there is a subunitary operator 



that annihilates vectors orthogonal to C [J] and that is unitary 
on C [J] . Then 

The vector | i/Znew) has height h — m. 

Actually, collimation generalizes to more than two input 
vectors. Given a list of phase vectors 

\\l/l),\\l/2),...,\\l/r), 

and given a collimation parameter m, we can produce a col- 
limate state I Vnew) from them. We summarize the process in 
algorithm form: 

Algorithm 4.1 (Collimation). Input: A list of phase vectors 

|V/l),|r2),..-,|rr) 

of length £i,...,£r, and a collimation parameter m. 



1. Notionally form the phase vector 

IV/) = \Wl)'®\V2)'®---'®\Wr) 

with indexing set 

[il] X [£2] X ••• X [£r] 

and phase multiplier function 

Hi) = Kh , 72, . . . , jr) =blUl)+ b2 ih) + • • • + briir)- 

2. Measure | \//) according to the value of 

c = b{J) mod 2'" (6) 

to obtain P^ \ yf) . 

3. Find the set / of tuples /' that satisfy (|6]). Set ^new = |-^| and 

pick a bijection 

n:J~^ [4ew]- 

4. Apply n to the value of b on J and apply Uji to | y/) to make 

|i/new) and return it. 



Algorithm 4. 1 is our basic method to collimate phase vec- 
tors. We can heuristically estimate the length £ by assuming 
that b{j) is uniformly distributed mod 2"*. In this case. 



^new ~ ^ ^l-t-2 ■ • • 



(7) 



So £ stays roughly constant when £ w 2'"/'^'' ^\ 



4.3. The complexity of collimation 

Proposition 4.2. Let \yfi) and ji/'i) be two phase vectors of 
length £\ and £2 and height h, and suppose that they are colli- 
mated mod 2"' to produce a phase vector \ i/new) of length £new 
Suppose also that the quantum computer is allowed QRACM. 
Then taking i!niax = inax(^i,i!2,^new) and r — 2, Algorithm 4.1 
needs 

• 0(£max) classical time (where "O" allows factors of 
both log^max ond h <n = logN). 



• 0( 



Ji) classical space. 



• (9(^maxmax(m,log^max)) classical space with quantum 
access, 

• poly (log £,Tiax) quantum time, and 

• 0{\og£^^y^) quantum space. 

Proof. First, we more carefully explain the data structure of 
a phase vector \y/). The vector jy/) itself can be stored in 
[log2^maxl qubits. The table b of phase multipliers is a ta- 
ble of length (9(£niax) whose entries have h bits, so this is 
0{£maxh) bits of classical space. Algorithm 4.1 needs the low 
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m bits of each entry in the table, so 0{imiixm) bits are kept in 
quantum access memory. We also assume that the table b is 
sorted on low bits. 



We follow through the steps of Algorithm 4. 1 taking care 



to manage resources at each step. First, measuring 
c={biUi)+b2U2)) (mod2'«) 

can be done in quantum time poly(log£,m) by looking up the 
values and adding them. As usual, when performing a partial 
quantum measurement, the output must be copied to an ancilla 
and the scratch work (in this case the specific values of bi and 
b2) must be uncomputed. 

The other step of coUimation is the renumbering. To review, 
the measurement of c identifies a set of double indices 

JC[ii]x[£2]. 

These indices must be renumbered with a bijection 

TZ-.J-^ [4ew], 

indeed the specific bijection that sorts the new phase multi- 
plier table b ^ bi + b2- The function tt can be computed in 
classical time 0{£) using standard algorithms, using the fact 
that bi and b2 are already sorted. More explicitly, we make an 
outer loop over decompositions 

c = ci+C2eZ/2"\ 

In an inner loop, we write all solutions to the equations 

biiji)=ci (mod 2'") b2U2)^C2 (mod 2"') 

using sorted lookup. This creates a list of elements of / in 
some order. We can write the values of 

^(;i,;'2) = ^i(;'i)+^2(;'3) 

along with the pairs (y'1,7'2) G J themselves. Then b can be 
sorted and J can be sorted along with it. 

This creates a stored form of the inverse bijection 7r^\ 
which is an ordinary 1 -dimensional array. We will want this, 
and we will also want quantum access to the /orvrarrf bijection 
7Z Stored as an associative array. Since we will need quantum 
access to n, we would like to limit the total use of this expen- 
sive type of space. We can make a special associative array to 
make sure that the total extra space is (9(^max(log^max)) bits. 
For instance, we can make a list of elements of J sorted by 
(71,72), a table of n sorted in the same order, and an index 
of pointers from [^1] to the first element of J with any given 
value of ji. 

The final and most deUcate step is to apply the bijection n 
to ly/) in quantum polynomial time in \og£. Imagine more 
abstractly that ji//) is a state in a Hilbert space C* supported 
on a subset X C [s], and that we would like to transform it 
to a state in a Hilbert space C supported on a subset Y C 
[t] of the same size, using a bijection n : X Y. We use 
the group structures [s] —Z/s and [t] = Z/f, and we assume 
quantum access to both n and ' . Then we will use these 



two permutation operators acting jointly on a register and 
a C register: 

Ui\x,y) = \x,y + n{x)) U2\x,y) = \x - 7i:'\y),y) . 

A priori, n{x) is only defined for x £ X and 7t^^{y) is only 
defined for y EY;we extend them by (or extend them arbi- 
trarily) to other values of x and y. Then clearly 

U2Ui\x,0) = |0,7r(x)). 

Thus 



f/2f/l|0,O) 



is what we want. Following the rule of resetting the height to 
0, we can also let 



□ 



Corollary 4.3. Taking the hypotheses of Proposition 4.2 if 
the quantum computer has no quantum access memory, then 
Algorithm\4.1\can be executed with r — 2 with 



• '3(^max) quantum time (and classical time), 

• C(^max) classical space, and 

• (?(log^max) quantum space. 

Corollary |4.3| fol lows immediately from Proposition |4.2| 
and Proposition |2.2| The point is that, even though there is a 
performance penalty in the absence of quantum access mem- 
ory, the same algorithm still seems competitive. 

4.4. The outer algorithm 

In this section we combine the ideas of Sections |3.2| |4.1| 
4.2 and 4.3 to make a complete algorithm. We present the 



algorithm with several free parameters. We will heuristically 
analyze these parameters in Section |43] Then in Section [ZT] 
we will simply make convenient choices for the parameter to 
prove that the algorithm has quantum time and classical space 
complexity exp{0{y/n)). 

The algorithm has a recursive subroutine to produce a phase 
vector of height 1 . The subroutine uses a collimation parame- 
ter < m{h) <n — h and a starting minimum length £0. 

Algorithm 4.4 (Collimation sieve). Input: A height h, a colli- 
mation parameter m ~ m{h), a branching parameter r — r{h), 
a starting minimum length £q, and access to the oracle Uj. 
Goal: To produce a phase vector of height h. 

1. \fh = n, extract phase vectors 

|V/l),|r2>,...,|<Kv) 

of height n from the oracle as described in Section [5] 
until the length of 

IV^new) = |ri,r2,...,r.s) 

is at least £q. Return | i/new)- 



8 



2. Otherwise, recursively and sequentially obtain a sequence 
of phase vectors 



\¥i),\¥2), 



of height h + m. 



4. ColHmate the vectors mod 2'" using Algorithm 4. 1 to pro- 
duce a phase vector |i/new) of height li. Return it. 



When called with h = I, Algorithm 4.4 produces a phase 
vector 

|V/)oc ^ (-1)^(^>|;). 
o<j<e 

Otherwise, we pick a maximal subset X C [£] on which b is 
equally often and 1 . (Note that this takes almost no work, be- 
cause the collimation step sorts b.) IfX is empty, then we must 



run Algorithm 4.4 again. Otherwise, we measure whether | iff) 
is in C[X]. If the measurement fails, then again we must run 
Subroutine A again. Otherwise the measured form of | y/) has 
a qubit factor of the form 

io>+(-ini), 

and this can be measured to obtain the parity of s. 



Algorithm 4.4 recursively makes a tree of phase vectors that 



are more and more collimated, starting with phase vectors ob- 
tained from the hiding function /(./', a) by the weak Fourier 
measurement. An essential idea, which is due to Regev and is 
used in his algorithm, is that with the collimation method, the 
tree can be explored depth-first and does not need to be stored 
in its entirety. Only one path to a leaf needs to be stored. No 
matter how the collimation parameter is set, the total quan- 
tum space used is 0{n^), while the total classical space used 
is 0{nmax{£)). (But the algorithm is faster with quantum ac- 
cess to the classical space.) 

An interesting feature of the algorithm is that its middle 
part, the collimation sieve, is entirely pseudoclassical. The al- 
gorithm begins by applying QFTs to oracle calls, as in Shor's 
algorithm. It ends with the same parity measurement as Si- 
mon's algorithm. These parts of the algorithm are fully quan- 
tum in the sense that they use unitary operators that are not 
permutation matrices. However, collimation consists entirely 
of permutations of the computational basis and measurements 
in the computational basis. 



4.5. Heuristic analysis 

Heuristically the algorithm is the fastest when r = 2. 

Suppose that the typical running time of the algorithm is 
/(n), with some initial choice of m = m{\). First, creating a 
phase vector of height h is similar to running the whole al- 
gorithm with n' ~n — h. So the total computation time (both 
classical and quantum) can be estimated as 



Here the first term is dominated by the classical work of colli- 
mation, while the second term is the recursive work. The two 
terms of the minimand are very disparate outside of a narrow 
range of values of m. So we can let g{n) = log2 f{n), and con- 
vert multiplication to addition and approximate addition by 
max. (This type of asymptotic approximation is lately known 
in mathematics as tmpicalization.) We thus obtain 

g{n) ^ min (max(m, g{n — m) + I) . 

m 

The solutions to this equation are of the form 

,m{m+l) . 
gi^^ -+c)=m, 

where c is a constant. We obtain the heuristic estimate 

/(«) ^ 0(2^) 



(8) 



for both the quantum plus classical time complexity and the 
classical space complexity of the algorithm. We put a question 
mark because we have not proven this estimate. In particular, 
our heuristic calculation does not address random fluctuations 
in the length estimate Q. 

If the quantum computer does not have QRACM or if it is 
no cheaper than quantum memory, then the heuristic (|8]l is the 
best that we know how to do. If the algorithm is implemented 
with QRACM, then the purely quantum cost is proportional to 
the number of queries. In this case, if there is extra classical 
space, we can make m larger and larger to fill the available 
space and save quantum time. This is the "second parameter" 
mentioned in Section[T] However, this adjustment only makes 
sense when classical time is much cheaper than quantum time. 
In particular, ([8]) is our best heuristic if classical and quantum 
time are simply counted equally. 

If classical space is limited, then equation (|7]i tells us that 
we can compensate by increasing r. To save as much space as 
possible, we can maintain t = 2 and adjust in each stage of the 
sieve r to optimize the algorithm. In this case the algorithm 
reduces to Regev's algorithm. 



4.6. A rigorous complexity bound 

The goal of this section is to rigorously prove 2^'' ^^ com- 
plexity bounds for a likely inefficient modified version of Al- 
gorithm 4.4 For simplicity we assume that n = m^, and we 
assume two hidden shifts /o and f\. In the first stage, we 
form phase vectors of length £o = 2'"+' from m + 1 qubits of 



the form We construct the collimation sieve using Algo- 
rithm |4T| to align m low bits of the phase multiplier at each 
stage except the last stage. Suppose that the output phase vec- 
tor |\//) from a use of Algorithm 4.1 has length £i. We divide 



f{n) « min (2™ + 2/(n - m)) . 



the indexing set of 1 1//) into segments of length and a left- 
over segment of length £2 < k)- Then we perform a partial 
measurement corresponding into this partition into segments. 
If the measured segment is the short one, then the phase vec- 
tor is simply discarded; in particular if l\ < £0, then the vector 
is discard with no measurement. Finally at the last stage, we 
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measure the phase vector | according to the value of m — 1 
of the remaining m phase multipHer bits. After this partial 
measurement we have a residual qudit with some £i states. We 
pair these states arbitrarily, leaving one singleton if £i is odd, 
and again perform the partial measurement corresponding to 
this partition. Assuming that the residual state is a qubit, and 
a qubit of the form Q, then we use it to measure the parity of 
the phase shift s; otherwise we discard it and restart the entire 
computation. 

Proposition 4.5. The modified fonn of Algorithm \4.4\ uses 
quantum time and classical space 2^^^\ and quantum space 
O(logn). 

Proof. At the heuristic level, the bounds are straightforward. 
The question is to establish that the modified algorithm suc- 
ceeds at each stage with probability bounded away from 0. 
The algorithm can locally fail in three ways: (1) In the inter- 
mediate stages, it can make a phase vector which is too short, 
either with £i < Iq, or the trimming measurement could leave 
the remnant of length I2 < £o- (2) At the final stage, we might 
have £1 < 2, or again have a remnant after the trimming mea- 
surement might have length £2 < 2. (3) If a qubit is produced 
at the very end, it could have a trivial phase multiplier rather 
than the form (|5]l. 

To address the first problem, we have two phase vectors 
IV'^i) and lv'2) with tables of phase multipliers bi and b2- The 
combined phase vector v/2) then has the phase multiplier 
bi + b2, and we measure |\//i,\//2) according to the low m 
bits of bi +b2- Heuristically we can suppose that bi +b2 is 
randomly distributed, even though it cannot be exactly true. 
We claim that at the rigorous level, the modified sieve has 
an adequate chance of success regardless of the distribution 
of bi +b2- The phases listed in bi + b2 are divided among 
only 2'" buckets. If we pick an entry at random, which gives 
the correct distribution for the partial measurement, then with 
probability at least 3/4, then it lies in a bucket of size £1 > £q, 
regardless of how the entries are distributed. Then, if £\ > £0, 
the probability that the trimming measurement creates a phase 
vector length exactly £0 is at least 1 /2. 

The second problem is addressed in the same way: The 
2m+i tgj-jjjg of (jjg last-Stage phase vector are divided among 
2™^' buckets, so measuring which bucket produces a qudit of 
length £2 > 2 with probability at least 3/4. Then the trimming 
measurement produces a qubit with probability at least 2/3. 

Finally the third problem requires some knowledge of the 
distribution of the phase multipliers. The final phase multi- 
plier is either 2"^ ' or 0, and the former value is the favorable 
one that allows us to measure the parity of s. Recall that the 
initial phase qubits ^ had uniformly random phase multipli- 
ers; in particular the highest bits are uniformly random and 
independent. All of the decisions in the algorithm so far de- 
pend only on the other bits of the phase multiplier. The final 



phase multiplier is a sum of some of the high bits of the initial 
phase multipliers, and is therefore also uniformly random. So 
we obtain the state (|5]l with probability 1/2 at this stage. □ 

5. CONCLUSIONS 

At first glance, the running time of our new algorithm for 
DHSP or hidden shift is "the same" as our first algorithm, 
since both algorithms run in time 2'^('^'°8'^'. Meanwhile 
Regev's algorithm runs in time 2'^'^V'J°sm^e^) ^ which 
may appear to be almost as fast. Of course, these expressions 
hide the real differences in performance between these algo- 
rithms, simply because asymptotic notation has been placed 
in the exponent. All polynomial-time algorithms with input 
of length n run in time 

„o(i) ^20(i°g"). 

Nonetheless, polynomial accelerations are taken seriously in 
complexity theory, whether they are classical or quantum ac- 
celerations. 



For many settings of the parameters. Algorithm 4.4 is su- 
perpolynomially faster than Regev's algorithm. It is Regev's 
algorithm if we have exponentially more quantum time than 
classical space. However, in real life, classical computation 
time has only scaled polynomially faster than available classi- 
cal computer memory. So it is reasonable to consider a future 
regime in which quantum computers exist, but classical mem- 
ory is cheaper than quantum time, or is only polynomially 
more expensive. 

Regev 111] established a reduction from certain lattice 
problems (promise versions of the short vector and close vec- 
tor problems) to the version of DHSP or hidden shift in which 
/(fl) and g{a + s) are overlapping quantum states. At first 
glance, our algorithms apply to this type of question. How- 
ever, we have not found quantum accelerations for these in- 
stances. The fundamental reason is that we have trouble com- 
peting with classical sieve algorithms for these lattice prob- 
lems The classical sieve algorithms work in position 
space, while our algorithms work in Fourier space, but other- 
wise the algorithms are similar Instead, DHSP seems poten- 
tially even more difficult than related lattice problems (since 
that is the direction of Regev's reduction) and the main func- 
tion of our algorithms is to make DHSP roughly comparable 
to lattice problems on a quantum computer. 



One significant aspect of Algorithm 4.4 and also in a way 



Regev's algorithm, is that it solves the hidden subgroup prob- 
lem for a group G — Dn without staying within the represen- 
tation theory of G in any meaningful way. It could be inter- 
esting to further explore non-representation methods for other 
hidden structure problems. 
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